An Application of Operational Research to Computational Linguistics: Word Ambiguity

نویسندگان

  • Kevin Durda
  • Richard J. Caron
  • Lori Buchanan
چکیده

This paper draws on graph theory and optimization techniques to develop a new measure of word ambiguity (e.g., homonymy and polysemy) for use in psycholinguistic research. This measure provides information regarding the uncertainty of the intended meaning of English words. Specifically, data about sixty-four thousand distinct words was collected from a corpus of close to three hundred million words. These data are used to generate information about word association which forms a basis for the creation of semantic graphs from which clusters are created and analyzed. The clusters identify groups of words related to the different meanings of a word and are used to calculate a set of relative probabilities for the meanings. These are in turn used to calculate the information entropy for the word, which acts as a surrogate measure of ambiguity. A genetic algorithm is used to optimally determine parameters for our formula for word association and for the graph clustering algorithm. The effectiveness of this application is demonstrated with examples from psycholinguistic research. keywords: computational linguistics, word ambiguity, graph clustering, genetic algorithm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AXEL : a framework to deal with ambiguity in three-noun compounds

Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in tur...

متن کامل

A Statistically Emergent Approach for Language Processing: Application to Modeling Context Effects in Ambiguous Chinese Word Boundary Perception

This paper proposes that the process of language understanding can be modeled as a collective phenomenon that emerges from a myriad of microscopic and diverse activities. The process is analogous to the crystallization process in chemistry. The essential features of this model are: asynchronous parallelism; temperature-controlled randomness; and statistically emergent active symbols. A computer...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew

This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguat...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • INFOR

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2010